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ABSTRACT 

A  sudden  bearing  failure  leads  to  fatal  accidents  and  results  in  loss  of  human  life  and  increases  the  down  time 
of  the  machine.  Neural  Network  and  Support  Vector  Machines  (SVM)  are  widely  used  in  rotating  machinery  fault 
diagnosis ,  while  Random  Forest  (RF)  based  on  the  ensemble  learning  method ,  is  relatively  unknown  in  this  field. 
Currently ,  use  of  Micro  Electro  Mechanical  Sensors  (MEMS)  for  machinery  fault  diagnosis  is  receiving  more  attention 
as  it  is  low  cost ,  compact  and  portable.  In  this  paper ,  vibration  signals  are  collected  from  the  Rolling  Element  Bearing 
(REB)  using  the  MEMS  sensor.  Statistical  features  have  been  extracted  from  the  wavelet  packet  coefficients  of  the 
vibration  signals  and  used  as  input  feature  for  the  classification  purpose.  A  framework  for  the  comparison  of  RF  and 
SVM  is  presented  to  identify  the  best  classifier  for  bearing  fault  diagnosis.  RF  emerged  as  the  best  classifier  based  on  the 
classification  accuracy  especially  with  a  small  training  set  leading  to  a  promising  tool  for  bearing  fault  diagnosis. 
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1.  INTRODUCTION 

Bearings  are  vital  elements  of  rotary  machinery  right  across  the  industrial  sectors.  The  bearing  failure 
leads  to  major  breakdown  resulting  in  untold  and  expensive  down  time  that  should  be  avoided  at  all  costs. 
Hence,  intelligently-based  bearing  failure  diagnosis  and  prognosis  should  be  an  integral  part  of  the  asset 
maintenance  and  management  activity  in  any  industry  using  rotary  machines  [1-2].  Vibration  analysis  is  superior 
as  it  reveals  essential  information  about  the  condition  of  the  machine  [3].  The  use  of  conventional  piezoelectric 
accelerometers  in  vibration  signal  acquisition  is  costly  affair  because  of  their  size,  cost  and  price  associated  with 
the  signal  conditioning  circuitry.  Currently  Micro  electro  Mechanical  sensor  (MEMS)  is  gaining  significance  as  it 
is  not  only  cheaper  in  price,  but  also  provides  the  same  results  as  that  of  conventional  accelerometer.  Jyoti  K.  Sinha 
[5]  tested,  integrated  circuit  piezoelectric  (ICP)  type  accelerometers  for  its  reliability.  ICP  accelerometer  from  three 
well  known  manufacturers  was  tested  in  the  laboratory  and  he  found  accuracy  of  sinusoidal  response  measurement 
was  satisfying.  Jardine  et  al  [6]  discussed  the  use  of  multiple  sensor  &  different  techniques  for  data  fusion. 
They  showcased  the  next  generation  diagnostic  and  prognostic  methods  for  condition  monitoring  (CM).  Uhlmann 
et  al  [7]  presented  an  advanced  wireless  CM  system  for  industrial  units.  Data  acquisition,  signal  processing  and 
classification  are  done  using  MEMS  sensor  and  Raspberry  Pi  2.  Feldman  et  al  [8]  showed  the  use  of  MEMS  sensor 
for  in  situ  CM  work  and  their  proposed  method  assisted  in  early  detection  of  failure,  fault  diagnosis  and  integrated 
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diagnosis  system. 

Time  and  spectrum  analysis  methods  are  used  in  signal  analysis  to  detect  fault  conditions.  Time  domain  analysis 
provides  basic  description  of  the  bearing  health.  Statistical  features  such  as  mean,  standard  deviation  and  Root  Mean 
Square  are  used.  In  spectrum  analysis,  vibration  data  is  represented  as  a  function  of  time  and  it  is  not  suitable  for  analyzing 
trainsient  vibration  signal,  as  it  fails  to  reveal  transient  information  such  as  impacts  contained  in  it.  These  transient 
components  contain  vital  information  about  the  machinery  defects.  Hence,  Wavelet  Transform  (WT)  is  commonly  used  to 
analyze  them  as  it  provides  a  variable  window  technique  which  uses  different  time  intervals  to  analyze  high  frequency  and 
low  frequency  components.  It  divides  the  data  in  to  approximation  and  detail  coefficients  in  variable  scale  providing 
efficient  tool  for  non- stationary  signal  analysis  than  FFT  [9].  WT  works  well  with  weak  signals,  denoising  process  and 
singularity  detection  [10]. 

Shimada  et  al  [11]  applied  SVM  to  find  the  fault  location  in  fixed  components.  They  firstly  used  variations  in  the 
natural  frequency  of  the  body  to  train  the  SVM  model,  and  then  to  detect  fault  location.  The  prime  intention  was  to 
minimize  the  usage  of  sensors  to  acquire  fault  signal  from  the  system.  They  found  that  this  method  adequately  reduced  the 
possibility  of  improper  fault  detection.  Rojas  et  al  [12]  used  SVM  for  bearing  fault  diagnosis.  They  used  four  conditions  of 
bearing  vibration  signals  in  their  work  and  concluded  that  SVM  accuracy  reduces  with  the  reduction  in  the  data  size.  Yan 
et  al  [13]  used  SVM  for  rotor  failure  diagnosis.  They  observed  that  the  SVM  was  effective  in  identifying  machinery 
failures  particularly  in  complex  operating  conditions  as  there  is  limit  for  input  parameters  and  the  computation  time  is  also 
small.  Zhong  et  al  [14]  utilized  SVM  for  the  fault  identification  in  gearbox.  They  designed  a  test  setup  to  identify  the  most 
usual  failures  in  the  gearbox,  such  as  misalignment,  wear,  and  imbalance.  They  concluded  that  the  SVMs  are  capable  of 
recognizing  various  failure  types  precisely.  Yang  et  al  [15]  investigated  the  usage  of  the  random  forests  (RF)  algorithm  in 
machinery  fault  diagnosis.  They  developed  a  hybrid  model  by  combining  genetic  algorithm  with  RF  to  enhance  the  output 
of  the  RF.  The  performance  of  the  method  is  carried  using  induction  motor  vibration  data  and  experimental  data  is  used  to 
validate  the  same.  Ferenc  et  al  [16]  acquired  the  vibration  signals  from  bearings  and  performed  signal  processing 
techniques  like  signal  decomposition  and  signal  denoising.  They  obtained  eighteen  statistical  features  from  the  healthy  and 
faulty  vibration  signals  utilizing  wavelets.  These  features  are  given  as  input  to  the  RF.  RF  yielded  a  classification  accuracy 
of  99.51%  for  one  tree.  The  accuracy  was  increased  up  to  99.81%  when  they  increased  the  number  of  trees  to  eight.  Yao  et 
al  [17]  studied  four  types  of  rotor  failure  due  to  unbalancing,  misalignment,  surge,  and  bearing  failure.  They  compared 
performances  of  decision  tree  (DT)  and  RF  and  concluded  that  RF  resulted  in  higher  classification  accuracy  in  comparison 
to  DT.  To  validate  this  method,  the  authors  verified  this  method  by  comparing  the  results  with  SVM  classifier.  They  found 
that  RF  is  a  valid  method  for  fault  diagnosis  of  rotors.  Patel  et  al  [18]  established  a  RF  classifier  for  multiclass  problem  of 
mechanical  faults  in  an  induction  motor  bearing.  They  compared  the  results  of  the  RF  with  ANN  and  they  concluded  that 
RF  is  superior  when  compared  to  ANN.  Seera  et  al  [19]  presented  an  intelligent  system  using  RF  and  Fuzzy  min-max 
neural  network  and  identified  it  as  FMM-RF.  They  used  the  developed  model  to  ball  bearing  condition  classification. 

Literature  reveals  a  number  of  such  classifier  like  Decision  Tree  (DT),  NN,  SVM  and  naive  Bayes.  The  model 
parameter  such  as  penalty  factor  and  kernel  functions  has  an  effect  on  SVM  performance.  The  genetic  algorithm  and 
particle  swarm  optimization  can  be  used  for  selecting  model  parameters  which  results  in  more  computational  time.  Also,  a 
single  classifier  will  often  emerge  as  a  problem  of  low  classification  accuracy  or  over  fitting.  To  address  this  issue, 
Random  Forest  is  used  which  consists  of  several  decision  trees.  Also,  RF  provides  an  opportunity  to  do  diagnosis  without 
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performing  feature  selection  task.  This  is  because  feature  selection  may  lead  to  over-  compressing  the  original  data  and  as 
stated  by  Breiman  and  RF  performance  is  superior  when  the  data  size  is  large.  RF  uses  repetitive  partitioning,  resulting  in 
several  trees  and  then  aggregate  the  results.  Also,  RF  is  not  influenced  by  the  surrounding  noise.  [3,  15-17]. 

In  this  paper,  vibration  signals  are  obtained  using  ADXL335  sensor  for  three  states  of  the  bearing  i.e.  New  or 
unused  (N),  defect  on  Inner  Race  (IR)  and  defect  on  Outer  Race  (OR)  at  355  &  622  rpm  shaft  speed  and  1.7  kN  constant 
load.  The  raw  signals  are  denoised  and  eleven  statistical  features  have  been  obtained  from  the  wavelet  packet  coefficients 
and  these  features  are  fed  to  RF  and  SVM.  The  classifier  result  will  aid  in  selecting  the  best  classifier  in  this  work.  Figure  1 
demonstrate  the  proposed  methodology  used  in  this  paper. 
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Figure  1:  Proposed  Methodology  for  Bearing  Fault  Diagnosis 


2.  WAVELET  TRANSFORM 


WT  is  categorized  as  Continuous  Wavelet  Transform  (CWT),  Discrete  Wavelet  Transform  (DWT)  and  Wavelet 
Packet  Transform  (WPT).  The  CWT  of  a  signal  x  (t)  can  be  obtained  using  a  convolution  operation  between  the  signal  x  (t) 
and  the  complex  conjugate  of  the  wavelet  families,  which  is  written  as  [9] 


1  00  * 
cwt ( iS* , t)= — -j=  J  x(t)y/ 
vs  -°° 


dt 


(1) 


where  ^^is  the  complex  conjugate  of  the  scaled  and  shifted  wavelet  function  It  not  only  reduces 

redundancy  of  the  data,  but  also  provides  quality  information  contained  in  the  original  signal.  Dyadic  scales  (s=2,  ^  =  k  2]) 
is  used  to  achieve  this  [9-10]. 


DWT  is  derived  from  the  discritization  of  CWT  and  is  expressed  as 


DWT(j,k)=[x(t\yrj  ^(t) )  =-r=  \  x(t)¥ 


r 

t-k  2J 


2J  -°° 


dt 


\  J 


(2) 


where  the  symbol  ^  denotes  inner  product  operation.  Figure  2  shows  the  basic  steps  involved  in  the 
decomposition. 
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Figure  2:  Basic  Decomposition  Step 

In  WPT,  W  is  a  function  of  three  integers  of  (i,  j,  k)  defined  as 

Wfk(t)  =  2^Wn(2it-k) 


(3) 


where  integers  j  and  k  represent  the  index  of  scale  and  translation  respectively,  and  the  index  n  is  called  the 
modulation  parameter  [9] .  The  first  two  -  wavelet  packet  functions  are  scaling  and  mother  wavelet  functions  represented  as 

and  respectively,  given  by 


<(0=K0 

where  n  =2,3...,  the  function  can  be  defined  by  the  following  recursive  relationships 

W*(t)=42YJh{k)W^Qt-k) 

k 

w02;+1  (o = V2  x  g  c mu  (2 t-k) 


(4) 

(5) 


(6) 


(7) 


where  ‘h(k)  and  g(k)’  are  the  quadrature  mirror  filter  associated  with  predefined  scaling  and  mother  wavelet 


w. 


functions  [10].  The  wavelet  packet  coefficient  of  the  signal  AO  can  compUted  using  the  following  equation 


W 


(8) 


3.  EXPERIMENTAL  SETUP  AND  MEMS  SENSOR 


The  raw  signal  is  obtained  from  customized  bearing  test  set  up  and  is  shown  in  figure  3.  The  test  setup  has  a  shaft 
of  32  mm  diameter  and  is  supported  between  test  and  fixed  bearing.  The  shaft  rotor  assembly  is  driven  by  a  3-phase 
induction  AC  motor.  This  unit  has  speed  ratio  of  2.25  which  provides  a  variable  speed  i.e.  0  to  1400  rpm.  A  hydraulic 
loading  arrangement  is  made  to  apply  vertical  load  on  the  bearing  test.  The  ADXL335  is  mounted  on  the  test  bearing  unit 
and  it  is  coupled  to  NI  PCI  6221  Data  Acquisition  system  (DAQ)  hardware  which  in  turn  coupled  to  a  computer  through 
cables.  The  test  rig  is  allowed  to  run  for  some  time  and  the  raw  signals  are  obtained  with  the  help  of  a  DAQ  board.  The  raw 
signals  are  sampled  at  10,000  samples/sec  using  the  customized  LAB  VIEW  (VI)  programme  and  the  signals  are  stored  in 
the  computer  as.xls  file  [4]. 

The  defects  on  bearing  components  are  seeded  through  water  jet  machining,  drilling,  and  indentation  through 
hammering  methods.  In  this  work,  defects  were  simulated  on  the  IR  and  OR  of  6206  bearing  using  indentation  through 
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hammer  method.  The  parameters  of  SKF6206  bearing  is  shown  in  table  1.  The  ADXL  335  sensor  provides  analogue 

voltage  as  output  which  is  proportional  to  acceleration  by  sensing  changes.  The  sensor  converts  the  capacitance  changes 

due  to  acceleration  into  a  voltage  [4].  ADXL  335  sensor  senses  triaxial  (X,Y  and  Z)  direction  accelerations.  It  is  having  an 

inbuilt  temperature  compensation  circuitry,  hence  additional  circuitry  to  reduce  temperature  effects  is  not  required.  ADXL 

335  accelerometer  has  the  following  specifications:  measurement  range  ±3g,  sensitivity:  300mV/g,  frequency  band  width 

for  X  and  Y  terminal  is  60Hz,  and  for  Y  terminal  is  50Hz,  power  supply  -3V,  Noise  performance-  150  — -rms 

V Hz 

Figure  4  shows  the  photograph  of  the  ADXL  335  MEMS  accelerometer. 


Figure  4:  Photograph  of  ADXL335  MEMS  Accelerometer 

The  6206  SKF  bearing  specifications  and  its  characteristic  frequencies  are  indicated  in  table  1  &  2. 

Table  1:  SKF  6206  Bearing  Parameter 


Inner 

Diameter 

Outer 

Diameter 

Roller 

Number 

Diameter 
of  Ball 

Angle  of 
Contact 

Pitch 

Diameter 

25 

52 

9 

7.93 

0 

39 

Table  2:  Characteristic  Frequencies 


Rotating  Frequency 
(Hz) 

Ball  Pass  Frequency  Inner  Race 

(Hz) 

Ball  Pass  Frequency  Outer  Race 
(Hz) 

23.33 

128.41 

87.65 
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3.1  Modification  of  ADXL335  Accelerometer  for  band  Width  Selection 

As  per  the  specification  of  ADXL  335  accelerometer,  the  maximum  band  width  available  is  60  Hz.  In  this  work, 
the  required  maximum  frequency  band  width  is  128  Hz  (as  shown  in  table  2).  This  led  to  the  modification  of  the  existing 
commercial  ADXL  335  accelerometer.  The  required  band  width  is  obtained  by  removing  existing  capacitor  and  replacing 
it  with  a  new  capacitor  of  value  0.03  pF  in  series  [20].  The  vibration  signals  were  acquired  at  10k  samples/sec  for  ten 
seconds.  The  4N,  IR  and  OR  4  were  collected  at  the  operating  load  1.7  kN  and  varying  shaft  speeds. 

3.2  Wavelet  Denoising 

It  is  hard  to  remove  the  noise  effectively  just  by  utilizing  traditional  filtration  method.  Also,  these  traditional 
denoising  methods  require  the  knowledge  of  the  some  technical  parameters  [4,  21].  To  overcome  all  these  barriers,  the 
wavelet  based  denoising  is  used.  The  customised  MATLAB  codes  are  used  to  denoise  the  raw  signal  acquired  from  the  test 
rig  [4,  22].  Figure  5  and  6  shows  the  plot  of  raw  and  denoised  vibration  signals  for  3  conditions  of  bearing  and  the  energy 
values  of  the  odd  wavelet  packet  coefficients  at  level  three  considered  in  this  work. 

The  energy  and  kurtosis  values  are  computed  using  MATLAB  codes.  The  kurtosis  and  Energy  raw  and  denosed 
signals  at  622  rpm  is  shown  in  Table  3.  Denosing  is  superior  when  energy  values  of  the  denoised  signal  are  lower  and 
kurtosis  values  are  higher  when  compared  to  the  raw  vibration  signal  [9-10].  This  trend  is  clearly  observed  in  the  table  3. 
The  OR  has  highest  kurtosis  value  and  follows  a  descending  sequence  for  IR  and  N.  This  is  due  to  the  fact  that  outer  race  is 
closer  to  ADXL  335  sensor  and  hence  captures  defect  characteristic  information  effectively  when  compared  to  IR  and  N 
[24]. 


Table  3:  Raw  and  Denoised  Signal  Energy  and  Kurtosis 


SI.  No. 

Condition 

Eraw 

Eden 

TZ 

lvraw 

Eden 

1. 

N 

0.02 

0.018 

3.17 

3.31 

2. 

IR 

0.0608 

0.055 

5.15 

7.80 

3. 

OR 

0.08 

0.064 

4.37 

46.73 

Norma! 

0,1 

1  1  1  1  1  1  1  1  1 

•0.1 

( 

1  1  1  I  1  1  1  1  1 

1  1  2  3  4  5  6  ?  8  9  11 

xIO4 

IR 

0.1 

0 

41 

T  f! '  rf  U'M  |  f  1'ip  |f  ™  TytPWf  PPprf*? 1 Jf  r*  f  1 

1  1  2  3  4  5  6  ?  8  9  11 

xIO4 

OR 

0.1 

1  1  1  1  1  1  1  1  1 

0 

1  1  1  1  1  1  1  1  1 

0  1  2  3  4  5  6  ?  8  9  10 


samles  xio4 


(a)Raw  (b)  Denoised 

Figure  5:  Plots  of  (a)  Raw  Signals  and  (b)  Wavelet  Denoised  Vibration  Signals  for  the  Three 
Bearing  Conditions:  N,  IR,  and  OR,  at  a  Speed  of  622  rpm  and  Load  of  1.7  kN  [4] 


4.  WAVELET  TRANSFORM  AND  FEATURE  EXTRACTION 


WT  can  be  represented  in  CWT,  DWT,  and  WPT  [4].  Among  these  WPT  is  superior  as  it  decomposes  detailed 
coefficients  of  the  signal,  thus  providing  a  solution  to  the  limitation  of  DWT  [9-10].  WPT  is  superior  when  compared  to 


Impact  Factor  (JCC):  7.6197 


SCOPUS  Indexed  Journal 


NAAS  Rating:  3.11 


An  Experimental  Comparison  of  Random  Forest  and  Support  Vector  Machine  for  1507 

Bearing  Fault  Diagnosis:  A  Micro  Electro  Mechanical  Sensor  Approach 

other  WT  methods;  Hence,  WPT  is  used  in  this  work.  The  noise  free  signal  is  subjected  to  WPT  and  analysed  up  to  3  level 
using  ‘dbl’  wavelet  using  customized  codes  in  MATLAB  [22].  Figure  6  shows  the  energy  values  of  the  odd  wavelet 
packet  coefficients  computed  using  MATLAB  codes  [22].  An  ascending  trend  is  observed  for  wavelet  packet  node  (3,  7) 
which  is  not  found  in  the  reaming  packet  nodes.  Hence,  in  this  work  statistical  features  are  extracted  from  this  wavelet 
packet  node.  The  denoised  vibration  signal  is  split  into  10  segments  with  each  segment  comprising  of  10000  samples. 
Eleven  statistical  features  have  been  extracted  from  each  segment  of  the  signal.  Table  4  shows  the  list  of  statistical  features 
used  in  this  work  [4] . 


■  N 

■  IF: 

■  OR 


Figure  6:  Energy  Values  of  the  Odd  Wavelet  Packet  Coefficients 
Table  4:  List  of  Statistical  Features 


SI. 

No. 

F  eatures 

F  omiuLae 

SI. 

No. 

F  eatures 

F  omiuLae 

T1 

Mean 

2 

7=1  *r 

n 

T7 

Variance 

jV 

n  =  l 

T2 

RMS 

J 

TO  =  1 

T8 

Crest 

Factor 

T4/T2 

T3 

Standard 

deviation 

■ 

T9 

Latitude 

Factor 

T4/Vrl= 

T4 

Peak- 

Peak 

value 

max  je(rt)  —  -mirtjc(n) 

TIG 

Impulse 

Factor 

T4/T1 

T5 

Skewnes^ 

1 

Til 

Log-Log 

Ratio 

■n 

ClogT3>2tIOg|a:‘l 
+  1) 

( 

T6 

Kurtosis 

5.  CLASSIFIERS 

Two  machine  learning  techniques,  namely,  RF  and  SVM  are  used  to  check  the  diagnostic  capability  of  acquired 
vibration  signals. 

5.1  Random  Forest 

RF  is  introduced  by  Breiman  [24],  uses  decision  tree  (DT)  type  classification.  It  divides  the  original  training  set 
into  a  large  number  of  subsets  and  from  these  subsets  it  forms  DT  by  using  bagging  technique  [24].  Bagging  helps  in 
minimizing  the  variance  and  over-fitting  and  also  it  improves  the  classification  and  regression  models  corresponding  to 
classification  accuracy  and  stability  [24] .  The  bagging  process  randomly  selects  feature  samples  from  the  training  set  and 
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decision  trees  have  been  constructed  with  the  help  of  bootstrapping  technique.  All  the  tree  classifier  is  termed  a  class 
predictor.  The  RF  forms  decisions  by  counting  the  votes  of  class  predictors  on  each  class  and  after  that  choosing  the  best 
suited  class  as  far  as  the  quantity  of  votes  to  it.  Figure  7  shows  the  methodology  adopted  in  RF. 


Figure  7:  Work  Flow  of  RF  Algorithm  [26] 

The  steps  involved  in  the  working  of  RF  are  explained  as  follows. 


Table  5 


SI.  No 

Discription 

1 

Considers  random  sample  ofn  observations  from  the  data  set  with  the  replacement  of  m  observations 
using  boot  strap  re  sampling  method.  In  this,  (2/3  of  samples  will  be  selected  which  contain  repetitive 
samples  and  the  remaining  (1/3  J"1  are  termed  as  'out  of  bagging'  (OOB).  New  random  selection  of  cases 
is  performed  for  each  constructed  trees  ":  [25-26]. 

2 

Step  2:  Construct  DT  (maximum  size  without  pruning)  using  the  cases  considered  in  the  previous  step. 
In  this,  consider  a  subset  of  the  total  set  of  predictor  variables  every  time  that  itisneededto  split  the 
node.  The  predicted  set  variable  is  selected  as  random  subset  of  the  total  available  predictor  variable,  hi 
each  split,  some  prediction  results  cannot  be  considered,  but  prediction  results  omitted  in  one  split  can  be 
used  by  other  split  in  the  same  tree. 

3 

Step  3:  Repeat  step  I  and  step  2  to  generate  forest  i.e.  large  number  of  trees. 

4 

Step  4:  Run  the  example  through  each  tree  in  the  forest  and  note  the  predicted  value  to  score  a  case.  Use 
the  predicted  categories  for  each  tree  as  4£vote'=  for  the  best  class,  and  decide  the  output  based  on  the 
majority  votes  gathered  by  the  class  [261. 

5.2  Support  Vector  Machine 

SVM  has  been  demonstrated  successful  in  abnormality  detection.  SVM  was  presented  by  Vapnik.  It  streamlines  a 
limiting  curve  by  augmenting  the  separation  of  the  nearest  point  to  the  limiting  curve  [12].  It  is  an  effective  technique  for 
classification  and  prediction  with  a  small  number  of  observations.  It  is  utilized  as  a  part  of  machine  fault  diagnosis,  bearing 
defect  detections  and  diagnosis,  and  machine  learning,  since  it  indicates  remarkable  execution  in  modelling  and 
generalizing  when  contrasted  with  other  methods  for  example,  neural  network  The  ground  characteristic  of  the  SVM 
model  is  to  outline  the  original  nonlinear  data  into  a  high-dimensional  feature  space  and  a  hyper  plane  is  built  to  divide  two 
classes’  data  and  augment  the  margin  of  separation  amongst  itself  and  those  points  lying  closest  to  the  support  vectors  [ 
23,26]. 


Let  input  data,  xi  (i  =  1,  2,  ...,  n),  n  is  the  total  sample  number.  The  samples  are  considered  as  positive  and 
negative  class  designated  as  yj  =  1  for  positive  class  and  yi  =  -1  for  negative  class,  respectively.  In  the  case  of  linearly 
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f(x)  =  w^x  +  b=  Yj  vv  -x  ■  +  b=Q 

7=1  ]  ] 


(9) 


where,  w  is  vector  with  dimension  n  and  b  is  a  scalar.  The  position  of  separating  hyper  plane  is  interpreted  by 

scalar  b  and  vector  w.  The  separating  hyper  plane  is  created  by  decision  function  made  utilizing  sign  f(x)  to  classify  the 

input  data  into  either  positive  class  or  negative  class  [27].  The  separating  hyper  plane  should  hold  the  rule, 

yif(xi)  =  yi(wTxi+b)>lfor  i  =1,2,3, ....n  (10) 

From  the  geometry,  the  geometrical  margin  is  observed  to  be  ||w2||.  Considering  the  noise  with  slack  variable  £f 

and  the  error  penalty  C,  the  ideal  hyper  plane  separating  the  data  can  be  attained  as  a  solution  to  the  below  given 

optimization  problem  [27]. 

Minimize (— )  1 1  w2 1 1  +  C  ^  1 1 ) 

2  /= i 


Subjected  to 

b;(wr  i=  1 n 

>  0  i  =  1 n 


(12) 


where,  ^  is  the  separation  between  the  boundary  and  the  class  xt  that  lying  on  the  false  side  of  the  margin  [27]. 

As  shown  in  Figure  8,  SVM  works  in  a  way  that  the  dotted  lines  illustrate  the  margins  whose  separations  is 
maximized,  and  in  the  middle  of  these  margins,  a  boundary  is  placed  between  two  points. 

For  the  condition  where  the  linear  SVM  does  not  produce  adequate  result,  the  non-linear  SVM  is  used. 
Here  the  feature  vector  x  is  produced  by  non-linear  mapping  0(x),  to  a  high  dimension  feature  space,  where  the  best  hyper 
plane  can  be  found.  In  non-linear  SVM,  the  kernel  functions  are  utilized  to  avoid  over-fitting  through  which  a  complete 
non-linear  mapping  is  performed.  The  equation  for  kernel  function  is  given  by, 

*(jci,xJ.)={^(xj),^(x.)}  (!3) 

Where,  0(x;)  and  0(x;)  are  the  inner  products  of  the  vector. 


{  x  |  (w.x)  +b  =  +1} 


Figure  8:  SVM  Classification  of  Two  Classes  [26] 
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The  kernel  functions  used  in  SYM  is  represented  in  table  6. 


Table  6:  Formulation  of  Kernel  Functions 


Kernel  Function 

Formula 

Linear 

T 

xi  xj 

Polynomial 

(yxj  Xj  +  r)d ,y>  0 

RBF 

II  X,-Xj  II2 

6.  RESULTS  AND  DISCUSSIONS 


This  work  estimates  the  diagnostic  performance  of  vibration  signal  acquired  using  MEMS  accelerometer  using  RF 
and  SVM.  The  vibration  signals  were  acquired  at  1.7  kN  load,  355  and  622  rpm  shaft  speed  for  three  bearing  conditions. 
Vibration  signals  were  denoised  and  denoised  signals  were  subjected  to  WPT.  Eleven  statistical  features  were  extracted 
from  the  wavelet  packet  coefficients  Therefore;  the  dimension  of  the  feature  vector  is  60  x  IE  These  features  were  then 
divided  into  42  (70%)  training  data  and  18  (30%)  testing  data  that  were  used  in  the  construction  of  the  RF  and  SVM 
models  and  testing  scheme  respectively. 

To  attain  high  classification  results,  some  model  parameters  need  to  be  set  for  optimal  values  for  the  classifier. 
In  RF,  there  is  only  one  parameter  that  affects  the  performance,  i.e.  number  of  trees.  The  SVM  model  to  classify  bearing 
fault  diagnosis  was  constructed  using  the  one-against-one  method.  The  RBF  kernel  function  was  used  in  the  SVM  model. 
RF  and  SVM  models  were  constructed  and  then  the  models  were  tested  using  the  test  data  to  predict  the  conditions  of  the 
bearings.  All  features  (eleven  in  this  work)  were  given  as  input  to  SVM  model.  The  best  classifier  is  identified  based  on  the 
classifier  results. 

The  classification  accuracy  on  test  data  for  RF  and  SVM  is  found  to  be  88.8%  and  72.22%  respectively  as  evident 
from  the  table  7.  To  clearly  understand  about  the  class  wise  classification  accuracy  on  test  data  for  RF  and  SVM,  a 
confusion  matrix  is  used.  Table  7  gives  insight  about  the  prediction  accuracy  of  fault  cases  considered  such  as  N,  IR  and 
OR  for  both  RF  and  SVM.  The  diagonal  elements  of  the  confusion  matrix  represent  the  correctly  classified  conditions. 
In  this  work,  the  first  element  (i.e.  first  row  element)  6  samples  are  correctly  classified  as  ‘N’,  where  as  second  element  in 
the  same  row  i.e.  1  is  misclassified  as  TR’.  Of  the  18  samples,  16  samples  are  correctly  classified  by  the  RF  with  a 
classification  accuracy  of  88.8%.  Similar  observations  can  be  applied  to  the  SVM  confusion  matrix.  Thus  13  samples  were 
correctly  classified  with  a  classification  accuracy  of  72.2%. 


Table  7:  Classification  Accuracies  of  RF  and  SVM 


Accelerometer 

RF  (%) 

SVM 

Training  Accuracy 

Test  Accuracy 

Training  Accuracy 

Test  Accuracy 

MEMS 

90.4 

88.8 

73.8 

72.22 

Table  8:  Confusion  Matrix  of  RF  and  SVM 


RF 

SVM 

N 

IR 

OR 

N 

IR 

OR 

6 

1 

0 

5 

1 

0 

N 

1 

5 

0 

0 

3 

3 

IR 

0 

0 

5 

0 

1 

5 

OR 
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Performance  of  RF  mainly  depends  on  number  of  tress.  RF  provides  better  result  with  large  number  of  trees;  also 
it  takes  large  computational  time.  Relation  between  the  number  of  trees  and  test  accuracy  is  presented  in  Table  9.  When 
number  of  trees  increased,  until  a  high  number,  for  example  200  or  300,  no  over-fitting  was  observed  but  a  little  rippling 
exists.  Classification  accuracy  showcases  an  ascending  trend  up  to  300  trees  and  then  accuracy  decreases  even  though  the 
number  of  trees  increases. 


Table  9:  Number  of  Trees  and  Corresponding  Test  Accuracy 


No.  of  Trees 

Test  Accuracy  (%) 

100 

77.8 

200 

81.3 

300 

88.8 

400 

86.12 

500 

67.8 

600 

67.8 

7.  CONCLUSIONS 

Currently,  SVM  has  gained  progressively  more  importance  while  RF  is  relatively  unfamiliar  in  bearing  fault 
diagnosis  area.  In  this  work,  REB  fault  diagnosis  procedure  was  presented  in  which  low  cost  ADXL335  MEMS 
accelerometer  was  used  to  acquire  vibration  signals.  The  performance  of  RF  is  compared  with  the  existing  SVM  technique. 
The  results  demonstrated  the  superiority  of  the  RF  over  SVM.  RF  is  superior  as  it  contains  several  decision  trees  which 
avoids  the  problems  associated  with  single  decision  tree  (DT)  such  as  over  fitting  or  low  classification  accuracy.  Also,  RF 
doesn’t  require  feature  selection  task  for  fault  diagnosis  which  reduces  the  computation  time. 
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